Vectorization Techniques for BlueGene/L’s Double FPU

نویسندگان

  • Franz Franchetti
  • Stefan Kral
  • Juergen Lorenz
  • Christoph W. Ueberhuber
چکیده

This paper presents vectorization techniques tailored to meet the specifics of the twoway single-instruction multiple-data (SIMD) double-precision floating-point unit, which is a core element of the node ASICs of IBM's 360 Tflop/s supercomputer BlueGene/L. The paper focuses on the general-purpose basic-block vectorization methods provided by the Vienna MAP vectorizer. In addition, the paper introduces vectorization techniques specific to discrete signal transforms. The presented vectorization methods are evaluated in connection with the state-of-the-art automatic performance tuning systems SPIRAL and FFTW. The combination of automatic performance tuning and the presented vectorization techniques result in FFT codes tuned automatically to a single BlueGene/L processor which are up to 60% faster than the best scalar code generated by the respective systems and five times faster than the mixed-radix FFT implementation provided by the GNU scientific library GSL.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Tuned FFTs for BlueGene/L's Double FPU

IBM is currently developing the new line of BlueGene/L supercomputers. The top-of-the-line installation is planned to be a 65,536 processors system featuring a peak performance of 360 Tflop/s. This system is supposed to lead the Top 500 list when being installed in 2005 at the Lawrence Livermore National Laboratory. This paper presents one of the first numerical kernels run on a prototype BlueG...

متن کامل

Vectorization techniques for the Blue Gene/L double FPU

This paper presents vectorization techniques tailored to meet the specifics of the two-way single-instruction multiple-data (SIMD) double-precision floating-point unit (FPU), which is a core element of the node application-specific integrated circuit (ASIC) chips of the IBM 360-teraflops Blue Genet/L supercomputer. This paper focuses on the general-purpose basic-block vectorization and optimiza...

متن کامل

Extracting Message Types from BlueGene/L’s Logs

In this paper we present the results on extracting message types from the BlueGene/L supercomputer logs using the IPLoM (Iterative Partitioning Log Mining) algorithm. Previous work using IPLoM indicates that IPLoM shows promise as message type extraction algorithm. We compared the results of IPLoM against manually produced message types produced on the BlueGene/L data. To provide a baseline of ...

متن کامل

Fourier Transforms for the BlueGene/L Communication Network

A computational kernel of particular importance for many scientific applications is the Fast Fourier Transform (FFT) of multi-dimensional data. A fundamental challenge is the design and implementation of such parallel numerical algorithms to utilise efficiently thousands of nodes. The BlueGene/L is a massively parallel high performance computer organised as a three-dimensional torus of compute ...

متن کامل

FFT Compiler Techniques

This paper presents compiler technology that targets general purpose microprocessors augmented with SIMD execution units for exploiting data level parallelism. Numerical applications are accelerated by automatically vectorizing blocks of straight line code to be run on processors featuring two-way short vector SIMD extensions like Intel’s SSE 2 on Pentium 4, SSE 3 on Intel Prescott, AMD’s 3DNow...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006